Mapping an Automated Survey Coding Task into a Probabilistic Text Categorization Framework

نویسندگان

  • Daniela Giorgetti
  • Irina Prodanof
  • Fabrizio Sebastiani
چکیده

This paper describes how to apply a probabilistic Text Categorization method to a different and new domain where documents are answers to open end questionnaires and codes viewed as categories consist of a hierarchical model. A reduced size training set may be used taking advantage of the hierarchical organization of categories. The system developed in this framework aims at helping psychologists in the evaluation of open end surveys inquiring about job candidates’ competencies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automating survey coding by multiclass text categorization techniques

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple class...

متن کامل

Text Categorization

Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scientific articles according to predefined thesauri of technical terms, filing patents into patent directories, selective dissemination of information to info...

متن کامل

Replace Manual Coding of Customer Survey Comments with Text Mining: A Story of Discovery with Text as Data in the Public Sector

A common approach to analyzing open-ended customer survey data is to manually assign codes to text observations. Basic descriptive statistics of the codes are then calculated. Subsequent reporting is an attempt to explain customer opinions numerically. While this approach provides numbers and percentages, it offers little insight. In fact, this method is tedious and time-consuming and can even ...

متن کامل

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also sug...

متن کامل

Two-dimensional Clustering for Text Categorization

We propose a new method to improve the accuracy of Text Categorization using twodimensional clustering. In a number of previous probabilistic approaches, texts in the same category are implicitly assumed to be generated from an identical distribution. We empirically show that this assumption is not accurate, and propose a new framework based on twodimensional clustering to alleviate this proble...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002